休息状态功能磁共振成像(FMRI)是一种强大的成像技术,用于研究UTETO脑功能的功能发展。然而,胎儿的不可预测和过度运动具有有限的临床应用,因为它导致可以系统地改变了功能连接模式的大量信号波动。以前的研究专注于在大胎儿头部运动的情况下的运动参数的准确估计,并在每个时间点使用3D单步插值方法来恢复无动态的FMRI图像。这并不保证重建的图像对应于给定获取的数据的FMRI时间序列的最小错误表示。在这里,我们提出了一种基于胎儿FMRI散射切片的四维迭代重建的新技术。在一组真正的临床FMRI胎儿上定量评估所提出的方法的准确性。结果表明与传统的3D插值方法相比,重建质量的改进。
translated by 谷歌翻译
UTERO中显影人脑的定量评估至关重要,以完全理解神经发育。因此,正在开发自动化的多组织胎儿脑分段算法,这反过来需要训练注释数据。然而,可用的注释的胎儿脑数据集是有限的数量和异质性,妨碍稳健的细分域的域适应策略。在这种情况下,我们使用Fabian,胎儿脑磁共振采集数值模拟,模拟胎儿脑的各种现实磁共振图像以及其类标签。我们证明,这些多种合成注释数据,无成本生成并使用目标超分辨率技术进一步重建,可以成功地用于分段七种脑组织的深度学习方法的域改性。总体而言,分割的准确性显着增强,特别是在皮质灰质,白质,小脑,深灰色物质和脑干中。
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译
Conventional methods for human motion synthesis are either deterministic or struggle with the trade-off between motion diversity and motion quality. In response to these limitations, we introduce MoFusion, i.e., a new denoising-diffusion-based framework for high-quality conditional human motion synthesis that can generate long, temporally plausible, and semantically accurate motions based on a range of conditioning contexts (such as music and text). We also present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework through our scheduled weighting strategy. The learned latent space can be used for several interactive motion editing applications -- like inbetweening, seed conditioning, and text-based editing -- thus, providing crucial abilities for virtual character animation and robotics. Through comprehensive quantitative evaluations and a perceptual user study, we demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature. We urge the reader to watch our supplementary video and visit https://vcai.mpi-inf.mpg.de/projects/MoFusion.
translated by 谷歌翻译
Dynamic neural networks (DyNNs) have become viable techniques to enable intelligence on resource-constrained edge devices while maintaining computational efficiency. In many cases, the implementation of DyNNs can be sub-optimal due to its underlying backbone architecture being developed at the design stage independent of both: (i) the dynamic computing features, e.g. early exiting, and (ii) the resource efficiency features of the underlying hardware, e.g., dynamic voltage and frequency scaling (DVFS). Addressing this, we present HADAS, a novel Hardware-Aware Dynamic Neural Architecture Search framework that realizes DyNN architectures whose backbone, early exiting features, and DVFS settings have been jointly optimized to maximize performance and resource efficiency. Our experiments using the CIFAR-100 dataset and a diverse set of edge computing platforms have seen HADAS dynamic models achieve up to 57% energy efficiency gains compared to the conventional dynamic ones while maintaining the desired level of accuracy scores. Our code is available at https://github.com/HalimaBouzidi/HADAS
translated by 谷歌翻译
One of the main problems in applying deep learning techniques to recognize activities of daily living (ADLs) based on inertial sensors is the lack of appropriately large labelled datasets to train deep learning-based models. A large amount of data would be available due to the wide spread of mobile devices equipped with inertial sensors that can collect data to recognize human activities. Unfortunately, this data is not labelled. The paper proposes DISC (Deep Inertial Sensory Clustering), a DL-based clustering architecture that automatically labels multi-dimensional inertial signals. In particular, the architecture combines a recurrent AutoEncoder and a clustering criterion to predict unlabelled human activities-related signals. The proposed architecture is evaluated on three publicly available HAR datasets and compared with four well-known end-to-end deep clustering approaches. The experiments demonstrate the effectiveness of DISC on both clustering accuracy and normalized mutual information metrics.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Self-attention is of vital importance in semantic segmentation as it enables modeling of long-range context, which translates into improved performance. We argue that it is equally important to model short-range context, especially to tackle cases where not only the regions of interest are small and ambiguous, but also when there exists an imbalance between the semantic classes. To this end, we propose Masked Supervised Learning (MaskSup), an effective single-stage learning paradigm that models both short- and long-range context, capturing the contextual relationships between pixels via random masking. Experimental results demonstrate the competitive performance of MaskSup against strong baselines in both binary and multi-class segmentation tasks on three standard benchmark datasets, particularly at handling ambiguous regions and retaining better segmentation of minority classes with no added inference cost. In addition to segmenting target regions even when large portions of the input are masked, MaskSup is also generic and can be easily integrated into a variety of semantic segmentation methods. We also show that the proposed method is computationally efficient, yielding an improved performance by 10\% on the mean intersection-over-union (mIoU) while requiring $3\times$ less learnable parameters.
translated by 谷歌翻译
Previous virtual try-on methods usually focus on aligning a clothing item with a person, limiting their ability to exploit the complex pose, shape and skin color of the person, as well as the overall structure of the clothing, which is vital to photo-realistic virtual try-on. To address this potential weakness, we propose a fill in fabrics (FIFA) model, a self-supervised conditional generative adversarial network based framework comprised of a Fabricator and a unified virtual try-on pipeline with a Segmenter, Warper and Fuser. The Fabricator aims to reconstruct the clothing image when provided with a masked clothing as input, and learns the overall structure of the clothing by filling in fabrics. A virtual try-on pipeline is then trained by transferring the learned representations from the Fabricator to Warper in an effort to warp and refine the target clothing. We also propose to use a multi-scale structural constraint to enforce global context at multiple scales while warping the target clothing to better fit the pose and shape of the person. Extensive experiments demonstrate that our FIFA model achieves state-of-the-art results on the standard VITON dataset for virtual try-on of clothing items, and is shown to be effective at handling complex poses and retaining the texture and embroidery of the clothing.
translated by 谷歌翻译
家庭中的移动操纵器可以为患有严重运动障碍的人提供越来越多的自治权,他们在没有照料者的帮助下通常无法完成日常生活(ADL)的活动。辅助移动操纵器的远距离运行可以使患有运动障碍的人能够独立执行自我保健和家庭任务,但是有限的运动功能会阻碍人们与机器人接触的能力。在这项工作中,我们介绍了一个独特的基于惯性的可穿戴辅助界面,该辅助界面嵌入了熟悉的头饰服装中,适用于具有严重运动障碍的人,可以通过移动操纵器进行远程处理和执行身体任务。我们评估了这种可穿戴的界面(n = 16)和有运动障碍的个体(n = 2),用于执行ADL和日常家庭任务。我们的结果表明,可穿戴界面使参与者能够完成错误率,高度可感知的易用性和低工作负载度量的身体任务。总体而言,这种基于惯性的可穿戴设备是一种新的辅助接口选项,可控制家庭中移动操纵器。
translated by 谷歌翻译